NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mutual information and the encoding of contingency tables

https://doi.org/10.1103/PhysRevE.110.064306

Jerdee, Maximilian; Kirkley, Alec; Newman, M_E_J (December 2024, Physical Review E)

Mutual information is commonly used as a measure of similarity between competing labelings of a given set of objects, for example to quantify performance in classification and community detection tasks. As argued recently, however, the mutual information as conventionally defined can return biased results because it neglects the information cost of the so-called contingency table, a crucial component of the similarity calculation. In principle the bias can be rectified by subtracting the appropriate information cost, leading to the modified measure known as the reduced mutual information, but in practice one can only ever compute an upper bound on this information cost, and the value of the reduced mutual information depends crucially on how good a bound is established. In this paper we describe an improved method for encoding contingency tables that gives a substantially better bound in typical use cases, and approaches the ideal value in the common case where the labelings are closely similar, as we demonstrate with extensive numerical results.
more » « less
Full Text Available
Improved estimates for the number of non-negative integer matrices with given row and column sums

https://doi.org/10.1098/rspa.2023.0470

Jerdee, Maximilian; Kirkley, Alec; Newman, M_E_J (January 2024, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences)

The number of non-negative integer matrices with given row and column sums features in a variety of problems in mathematics and statistics but no closed-form expression for it is known, so we rely on approximations. In this paper, we describe a new such approximation, motivated by consideration of the statistics of matrices with non-integer numbers of columns. This estimate can be evaluated in time linear in the size of the matrix and returns results of accuracy as good as or better than existing linear-time approximations across a wide range of settings. We show that the estimate is asymptotically exact in the regime of sparse tables, while empirically performing at least as well as other linear-time estimates in the regime of dense tables. We also use the new estimate as the starting point for an improved numerical method for either counting or sampling matrices with given margins using sequential importance sampling. Code implementing our methods is available.
more » « less
Full Text Available
Representative community divisions of networks

https://doi.org/10.1038/s42005-022-00816-3

Kirkley, Alec; Newman, M. E. (December 2022, Communications Physics)

Abstract Methods for detecting community structure in networks typically aim to identify a single best partition of network nodes into communities, often by optimizing some objective function, but in real-world applications there may be many competitive partitions with objective scores close to the global optimum and one can obtain a more informative picture of the community structure by examining a representative set of such high-scoring partitions than by looking at just the single optimum. However, such a set can be difficult to interpret since its size can easily run to hundreds or thousands of partitions. In this paper we present a method for analyzing large partition sets by dividing them into groups of similar partitions and then identifying an archetypal partition as a representative of each group. The resulting set of archetypal partitions provides a succinct, interpretable summary of the form and variety of community structure in any network. We demonstrate the method on a range of example networks.
more » « less
Full Text Available
Clustering of heterogeneous populations of networks

https://doi.org/10.1103/PhysRevE.105.014312

Young, Jean-Gabriel; Kirkley, Alec; Newman, M. E. (January 2022, Physical Review E)

Full Text Available
The friendship paradox in real and model networks

https://doi.org/10.1093/comnet/cnab011

Cantwell, George T; Kirkley, Alec; Newman, M E (April 2021, Journal of Complex Networks)
Estrada, Ernesto (Ed.)
Abstract The friendship paradox is the observation that the degrees of the neighbours of a node in any network will, on average, be greater than the degree of the node itself. In common parlance, your friends have more friends than you do. In this article, we develop the mathematical theory of the friendship paradox, both in general as well as for specific model networks, focusing not only on average behaviour but also on variation about the average and using generating function methods to calculate full distributions of quantities of interest. We compare the predictions of our theory with measurements on a large number of real-world network datasets and find remarkably good agreement. We also develop equivalent theory for the generalized friendship paradox, which compares characteristics of nodes other than degree to those of their neighbours.
more » « less
Full Text Available
Belief propagation for networks with loops

https://doi.org/10.1126/sciadv.abf1211

Kirkley, Alec; Cantwell, George T.; Newman, M. E. (April 2021, Science Advances)
null (Ed.)
Belief propagation is a widely used message passing method for the solution of probabilistic models on networks such as epidemic models, spin models, and Bayesian graphical models, but it suffers from the serious shortcoming that it works poorly in the common case of networks that contain short loops. Here, we provide a solution to this long-standing problem, deriving a belief propagation method that allows for fast calculation of probability distributions in systems with short loops, potentially with high density, as well as giving expressions for the entropy and partition function, which are notoriously difficult quantities to compute. Using the Ising model as an example, we show that our approach gives excellent results on both real and synthetic networks, improving substantially on standard message passing methods. We also discuss potential applications of our method to a variety of other problems.
more » « less
Full Text Available
Balance in signed networks

https://doi.org/10.1103/PhysRevE.99.012320

Kirkley, Alec; Cantwell, George T.; Newman, M. E. J. (January 2019, Physical Review E)

Search for: All records